python list
Prompting for Numerical Sequences: A Case Study on Market Comment Generation
Kawarada, Masayuki, Ishigaki, Tatsuya, Takamura, Hiroya
Large language models (LLMs) have been applied to a wide range of data-to-text generation tasks, including tables, graphs, and time-series numerical data-to-text settings. While research on generating prompts for structured data such as tables and graphs is gaining momentum, in-depth investigations into prompting for time-series numerical data are lacking. Therefore, this study explores various input representations, including sequences of tokens and structured formats such as HTML, LaTeX, and Python-style codes. In our experiments, we focus on the task of Market Comment Generation, which involves taking a numerical sequence of stock prices as input and generating a corresponding market comment. Contrary to our expectations, the results show that prompts resembling programming languages yield better outcomes, whereas those similar to natural languages and longer formats, such as HTML and LaTeX, are less effective. Our findings offer insights into creating effective prompts for tasks that generate text from numerical sequences.
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
Salinas, Abel, Morstatter, Fred
Large Language Models (LLMs) are regularly being used to label data across many domains and for myriad tasks. By simply asking the LLM for an answer, or ``prompting,'' practitioners are able to use LLMs to quickly get a response for an arbitrary task. This prompting is done through a series of decisions by the practitioner, from simple wording of the prompt, to requesting the output in a certain data format, to jailbreaking in the case of prompts that address more sensitive topics. In this work, we ask: do variations in the way a prompt is constructed change the ultimate decision of the LLM? We answer this using a series of prompt variations across a variety of text classification tasks. We find that even the smallest of perturbations, such as adding a space at the end of a prompt, can cause the LLM to change its answer. Further, we find that requesting responses in XML and commonly used jailbreaks can have cataclysmic effects on the data labeled by LLMs.
Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata
Zhang, Bohui, Reklos, Ioannis, Jain, Nitisha, Peรฑuela, Albert Meroรฑo, Simperl, Elena
In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Engineering (LLMKE), combining knowledge probing and Wikidata entity mapping. The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge. The implementation is available at https://github.com/bohuizhang/LLMKE.
LLM2KB: Constructing Knowledge Bases using instruction tuned context aware Large Language Models
Nayak, Anmol, Timmapathini, Hari Prasad
The advent of Large Language Models (LLM) has revolutionized the field of natural language processing, enabling significant progress in various applications. One key area of interest is the construction of Knowledge Bases (KB) using these powerful models. Knowledge bases serve as repositories of structured information, facilitating information retrieval and inference tasks. Our paper proposes LLM2KB, a system for constructing knowledge bases using large language models, with a focus on the Llama 2 architecture and the Wikipedia dataset. We perform parameter efficient instruction tuning for Llama-2-13b-chat and StableBeluga-13B by training small injection models that have only 0.05 % of the parameters of the base models using the Low-Rank Adaptation (LoRA) technique. These injection models have been trained with prompts that are engineered to utilize Wikipedia page contexts of subject entities fetched using a Dense Passage Retrieval (DPR) algorithm, to answer relevant object entities for a given subject entity and relation. Our best performing model achieved an average F1 score of 0.6185 across 21 relations in the LM-KBC challenge held at the ISWC 2023 conference.
ChatGPT: Jack of all trades, master of none
Kocoล, Jan, Cichecki, Igor, Kaszyca, Oliwier, Kochanek, Mateusz, Szydลo, Dominika, Baran, Joanna, Bielaniewicz, Julita, Gruza, Marcin, Janz, Arkadiusz, Kanclerz, Kamil, Kocoล, Anna, Koptyra, Bartลomiej, Mieleszczenko-Kowszewicz, Wiktoria, Miลkowski, Piotr, Oleksy, Marcin, Piasecki, Maciej, Radliลski, ลukasz, Wojtasik, Konrad, Woลบniak, Stanisลaw, Kazienko, Przemysลaw
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT's capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25% for zero-shot and few-shot evaluation. For GPT-4 model, a loss for semantic tasks is significantly lower than for ChatGPT. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability to personalize ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool's usefulness to society and how the learning and validation procedures for such systems should be established.
GPT4GEO: How a Language Model Sees the World's Geography
Roberts, Jonathan, Lรผddecke, Timo, Das, Sowmen, Han, Kai, Albanie, Samuel
Large language models (LLMs) have shown remarkable capabilities across a broad range of tasks involving question answering and the generation of coherent text and code. Comprehensively understanding the strengths and weaknesses of LLMs is beneficial for safety, downstream applications and improving performance. In this work, we investigate the degree to which GPT-4 has acquired factual geographic knowledge and is capable of using this knowledge for interpretative reasoning, which is especially important for applications that involve geographic data, such as geospatial analysis, supply chain management, and disaster response. To this end, we design and conduct a series of diverse experiments, starting from factual tasks such as location, distance and elevation estimation to more complex questions such as generating country outlines and travel networks, route finding under constraints and supply chain analysis. We provide a broad characterisation of what GPT-4 (without plugins or Internet access) knows about the world, highlighting both potentially surprising capabilities but also limitations.
Python NumPy Tutorial - 2022
So you've learned the basics of Python and you're looking for a more powerful way to analyse data? NumPy is what you need.NumPy is a module for Python that allows you to work with multidimensional arrays and matrices. In addition, NumPy includes support for signal processing and linear algebra operations. So if you need to do any mathematical operations on your data, NumPy is probably the library for you. In this tutorial, we'll show you how to use NumPy to its full potential. You'll learn more about arrays as well as operate on them using mathematical functions. NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed. In this Python Numpy Tutorial, we will be learning about NumPy in Python, What is NumPy in Python, Data Types in NumPy, and more. NumPy in Python is a library that is used to work with arrays and was created in 2005 by Travis Oliphant.
Python lists, Numpy arrays and Pandas series
Let's say you have the odd numbers between 1 and 20 and you are storing them in the following ways: Lists, arrays and Pandas series look quite similar at a first glance, so people often ask -- why do we need different data structures? What are the pros and cons and use cases? The purpose of this brief article is to clear up some of that confusion. Lists are one of the 4 built-in data types in Python to store multiple items (3 other data types being dictionaries, tuples and sets). A single list can store multiple data types at once -- integers, floats, strings.
TextGenie - Augmenting your text dataset with just 2 lines of code!
Often while developing Natural Language Processing models, we find it difficult to find relevant data. Previously, while developing our Intent Classifier, we used the CLINC150 Dataset that had 100 samples for 150 different classes. But, what if we needed even more samples? One more similar scenario was when I was working on a contextual assistant with Rasa. While creating the training data from scratch, I'd have to imagine different samples for each intent or ask my friends for some help.